Tagging a corpus of Malay texts, and coping with ‘syntactic drift’

نویسندگان

  • Gerry Knowles
  • Zuraidah Mohd Don
چکیده

The structure of Malay presents the corpus linguist with an extremely interesting problem. At high syntactic levels, the language is familiar enough, and one can talk of direct objects in transitive constructions, and even of agentless passives. The dominant sentence order is SVO. Parsing at this level is therefore relatively straightforward. The problem is at lower levels, where Malay patterns quite differently from Indo-European languages. If the linguist tries to process Malay using categories and techniques designed for Indo-European, then it comes across as at best confusing and at worst in a state of chaos. Malay is neither confusing nor in chaos; but it does need to be analysed using techniques which are sensitive to its own patterns.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Processing Natural Malay Texts: a Data-driven Approach

This research represents the first attempt to produce a working system for the automatic processing of texts of Bahasa Melayu ‘Malay’. At the heart of the system is an integrated relational lexical database called MALEX, which draws on the experience of working on English and other languages, but which is specifically tailored to the conditions of Malay. The development of the database is from ...

متن کامل

A Syntactically and Semantically Tagged Corpus of Russian: State of the Art and Prospects

We describe a project aimed at creating a deeply annotated corpus of Russian texts. The annotation consists of comprehensive morphological marking, syntactic tagging in the form of a complete dependency tree, and semantic tagging within a restricted semantic dictionary. Syntactic tagging is using about 80 dependency relations. The syntactically annotated corpus counts more than 28,000 sentences...

متن کامل

Feature extraction in opinion mining through Persian reviews

Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which can play an important role in making major decisions in such area. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels d...

متن کامل

Part of Speech Tagger for Malay Language Based on Words Morphology

PART OF SPEECH TAGGER FOR MALAY LANGUAGE BASED ON WORDS MORPHOLOGY Mohd Pouzi Hamzah, Syarifah fatem Na’imah Binti Syed Kamaruddin School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, 21030 Kuala Terengganu, Terengganu, Malaysia Email: [email protected], [email protected] ABSTRACT : Part of Speech (POS) tagging is an essential task in pre-processing for text process...

متن کامل

An Approach to Proper Name Tagging for German

This paper presents an incremental method for the tagging of proper names in German newspaper texts. The tagging is performed by the analysis of the syntactic and textual contexts of proper names together with a morphological analysis. The proper names selected by this process supply new contexts which can be used for finding new proper names, and so on. This procedure was applied to a small Ge...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003